Search CORE

145 research outputs found

Generator-Retriever-Generator: A Novel Approach to Open-domain Question Answering

Author: Abdallah Abdelrahman
Jatowt Adam
Publication venue
Publication date: 20/07/2023
Field of study

Open-domain question answering (QA) tasks usually require the retrieval of relevant information from a large corpus to generate accurate answers. We propose a novel approach called Generator-Retriever-Generator (GRG) that combines document retrieval techniques with a large language model (LLM), by first prompting the model to generate contextual documents based on a given question. In parallel, a dual-encoder network retrieves documents that are relevant to the question from an external corpus. The generated and retrieved documents are then passed to the second LLM, which generates the final answer. By combining document retrieval and LLM generation, our approach addresses the challenges of open-domain QA, such as generating informative and contextually relevant answers. GRG outperforms the state-of-the-art generate-then-read and retrieve-then-read pipelines (GENREAD and RFiD) improving their performance at least by +5.2, +4.2, and +1.6 on TriviaQA, NQ, and WebQ datasets, respectively. We provide code, datasets, and checkpoints \footnote{\url{https://github.com/abdoelsayed2016/GRG}

arXiv.org e-Print Archive

Citation recommendation: approaches and datasets

Author: Färber Michael
Jatowt Adam
Publication venue: Springer
Publication date: 03/09/2020
Field of study

Citation recommendation describes the task of recommending citations for a given text. Due to the overload of published scientific works in recent years on the one hand, and the need to cite the most appropriate publications when writing scientific texts on the other hand, citation recommendation has emerged as an important research topic. In recent years, several approaches and evaluation data sets have been presented. However, to the best of our knowledge, no literature survey has been conducted explicitly on citation recommendation. In this article, we give a thorough introduction to automatic citation recommendation research. We then present an overview of the approaches and data sets for citation recommendation and identify differences and commonalities using various dimensions. Last but not least, we shed light on the evaluation methods and outline general challenges in the evaluation and how to meet them. We restrict ourselves to citation recommendation for scientific publications, as this document type has been studied the most in this area. However, many of the observations and discussions included in this survey are also applicable to other types of text, such as news articles and encyclopedic articles

KITopen

Citation Recommendation: Approaches and Datasets

Author: Färber Michael
Jatowt Adam
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 14/05/2020
Field of study

Citation recommendation describes the task of recommending citations for a given text. Due to the overload of published scientific works in recent years on the one hand, and the need to cite the most appropriate publications when writing scientific texts on the other hand, citation recommendation has emerged as an important research topic. In recent years, several approaches and evaluation data sets have been presented. However, to the best of our knowledge, no literature survey has been conducted explicitly on citation recommendation. In this article, we give a thorough introduction into automatic citation recommendation research. We then present an overview of the approaches and data sets for citation recommendation and identify differences and commonalities using various dimensions. Last but not least, we shed light on the evaluation methods, and outline general challenges in the evaluation and how to meet them. We restrict ourselves to citation recommendation for scientific publications, as this document type has been studied the most in this area. However, many of the observations and discussions included in this survey are also applicable to other types of text, such as news articles and encyclopedic articles.Comment: to be published in the International Journal on Digital Librarie

arXiv.org e-Print Archive

KITopen

Exploring the State of the Art in Legal QA Systems

Author: Abdallah Abdelrahman
Jatowt Adam
Piryani Bhawna
Publication venue
Publication date: 13/04/2023
Field of study

Answering questions related to the legal domain is a complex task, primarily due to the intricate nature and diverse range of legal document systems. Providing an accurate answer to a legal query typically necessitates specialized knowledge in the relevant domain, which makes this task all the more challenging, even for human experts. QA (Question answering systems) are designed to generate answers to questions asked in human languages. They use natural language processing to understand questions and search through information to find relevant answers. QA has various practical applications, including customer service, education, research, and cross-lingual communication. However, they face challenges such as improving natural language understanding and handling complex and ambiguous questions. Answering questions related to the legal domain is a complex task, primarily due to the intricate nature and diverse range of legal document systems. Providing an accurate answer to a legal query typically necessitates specialized knowledge in the relevant domain, which makes this task all the more challenging, even for human experts. At this time, there is a lack of surveys that discuss legal question answering. To address this problem, we provide a comprehensive survey that reviews 14 benchmark datasets for question-answering in the legal field as well as presents a comprehensive review of the state-of-the-art Legal Question Answering deep learning models. We cover the different architectures and techniques used in these studies and the performance and limitations of these models. Moreover, we have established a public GitHub repository where we regularly upload the most recent articles, open data, and source code. The repository is available at: \url{https://github.com/abdoelsayed2016/Legal-Question-Answering-Review}

arXiv.org e-Print Archive

ScholarSight: Visualizing Temporal Trends of Scientific Concepts

Author: Färber Michael
Jatowt Adam
Nishioka Chifumi
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2019
Field of study

2019 ACM/IEEE Joint Conference on Digital Libraries (JCDL): June 2 2019 to June 6 2019 Champaign, IL, USA.In this paper, we present a system for exploring the temporal trends of scientific concepts. Scientific concepts were captured by extracting noun phrases and entities from all computer science papers of arXiv.org. Our system allows users to review the time series of numerous concepts and to identify positively and negatively trending concepts. By applying clustering techniques and cluster analysis visualizations, it can also present concepts which share the same usage patterns over time. Our system can be beneficial for both ordinary researchers of any field and for researchers working in bibliometrics and scientometrics in order to investigate the evolution of scientific concepts

Kyoto University Research Information Repository

Dataset for Temporal Analysis of English-French Cognates

Author: Coustaty Mickaël
Doucet Antoine
Frossard Esteban
Hengchen Simon
Jatowt Adam
Publication venue: European Language Resources Association (ELRA)
Publication date: 13/05/2020
Field of study

Languages change over time and, thanks to the abundance of digital corpora, their evolutionary analysis using computational techniques has recently gained much research attention. In this paper, we focus on creating a dataset to support investigating the similarity in evolution between different languages. We look in particular into the similarities and differences between the use of corresponding words across time in English and French, two languages from different linguistic families yet with shared syntax and close contact. For this we select a set of cognates in both languages and study their frequency changes and correlations over time. We propose a new dataset for computational approaches of synchronized diachronic investigation of language pairs, and subsequently show novel findings stemming from the cognate-focused diachronic comparison of the two chosen languages. To the best of our knowledge, the present study is the first in the literature to use computational approaches and large data to make a cross-language diachronic analysis.Peer reviewe

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Helsingin yliopiston digitaalinen arkisto

A city-wide examination of fine-grained human emotions through social media analysis.

Author: Jatowt Adam
Jeszenszky Peter
Kawai Yukiko
Siriaraya Panote
Zhang Yihong
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2023
Field of study

The proliferation of Social Media and Open Web data has provided researchers with a unique opportunity to better understand human behavior at different levels. In this paper, we show how data from Open Street Map and Twitter could be analyzed and used to portray detailed Human Emotions at a city wide level in two cities, San Francisco and London. Neural Network classifiers for fine-grained emotions were developed, tested and used to detect emotions from tweets in the two cites. The detected emotions were then matched to key locations extracted from Open Street Map. Through an analysis of the resulting data set, we highlight the effect different days, locations and POI neighborhoods have on the expression of human emotions in the cities

PubMed Central

Bern Open Repository and Information System (BORIS)